Preamble: The purpose of this experiment was to investigate how plasticity induced size variation affects susceptibility to predation, or predator induced natural selection. Since rearing flies at lower temperatures increases body size, flies were reared at 3 temperatures to create variation in body size; 18.5, 21 and 24 degrees. To account for possible temperature specific adaptive advantage during the experiment (of flies or predator), temperature manipulation was only used during the rearing process. Approximately ~200 adult flies from each rearing temperature were collected and then subdivided into treatment and control groups. Treatment populations were exposed to a predator (15 spiders) and monitored for survival for up to 6 days.The respective control population was reared at the same temperature and of similar age, but were not exposed to a predator. Measurements of the wing, tibia, femur and thorax (all of which are expected to be important traits in predation risk) were used to assess size.

For our statistical approach, we will be emulating a similar analysis typically done for selection experiments. Typically, selection gradient analysis (multiple linear regression) is used to get selection estimates for each trait. The estimates that are output by the model are our selection gradient coeficients and act as estimates of how natural selection is acting on a trait. However, selection gradient analysis is typically done by using quadratic regressions that can help determine the type of selection(directional, stabilizing, or disruptive). For our analysis we will be only using linear regressions so we will only be able to identify if directional selection on a trait is occurring.

Standardizing the data of each trait will allow us to compare the estimates among traits and their effect on selection. In addition, te way selection gradient analysis is typically done in the field is converting the response variable (survivorship in our case) to relative fitness (fitness of individual/ mean(population fitness)), and plotting this against the standardized traits. In our analysis, we show both relative fitness and days survived. Although relative fitness is typically used in the field, most of our analyses used days survived as our response variable because this was more relavant to the questions we wanted to address.

In our presentation, we thought we would do our analysis with our response variable (survival) as a binomial response (dead or alive). However, we decided it was best to not throw out the data we got about number of days survived, so instead we used the number of days survived as our response variable.

Packages Required:

library(tidyr)
## Warning: package 'tidyr' was built under R version 3.4.4
library(tidyverse)
## ── Attaching packages ─────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0       ✔ purrr   0.3.2  
## ✔ tibble  2.1.1       ✔ dplyr   0.8.0.1
## ✔ readr   1.3.1       ✔ stringr 1.4.0  
## ✔ ggplot2 3.1.0       ✔ forcats 0.4.0
## Warning: package 'ggplot2' was built under R version 3.4.4
## Warning: package 'tibble' was built under R version 3.4.4
## Warning: package 'readr' was built under R version 3.4.4
## Warning: package 'purrr' was built under R version 3.4.4
## Warning: package 'dplyr' was built under R version 3.4.4
## Warning: package 'stringr' was built under R version 3.4.4
## Warning: package 'forcats' was built under R version 3.4.4
## ── Conflicts ────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(dplyr)
library(lattice)
library(car)
## Warning: package 'car' was built under R version 3.4.4
## Loading required package: carData
## Warning: package 'carData' was built under R version 3.4.4
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
## The following object is masked from 'package:purrr':
## 
##     some
library(effects)
## Warning: package 'effects' was built under R version 3.4.4
## Use the command
##     lattice::trellis.par.set(effectsTheme())
##   to customize lattice options for effects plots.
## See ?efffectsTheme for details.
library(lme4)
## Warning: package 'lme4' was built under R version 3.4.4
## Loading required package: Matrix
## Warning: package 'Matrix' was built under R version 3.4.4
## 
## Attaching package: 'Matrix'
## The following object is masked from 'package:tidyr':
## 
##     expand
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
pref_theme <- theme_classic() +
  theme(text = element_text(size = 15))
theme_set(pref_theme)

Reading in the dataset:

pred_data <- read.csv("FO_TempPlasticitySelection_complete_wide_April2019.csv")
pred_data <- na.omit(pred_data)
##summary(pred_data)

We can put each of the variables into their appropriate classes:

pred_data <-(pred_data %>% 
                      mutate(cage=as.factor(cage)) %>%
                      mutate(temperature=as.factor(temperature)) %>%
                      mutate(sex=as.factor(sex))) %>%
                      mutate(survival=as.numeric(survival))

We can also scale the data- this will be important when we want to interpret each of the traits in our model later on. The scale function has the general formula of trait measurement - mean(trait measurement)/standard deviation(trait measurement). Since we were analyzing each temperature group seperately and we were not including the non-predator treatments in our analysis, we grouped the data by temperature and treatment before standardizing our traits and calculating relative fitness.

pred_data <- pred_data %>% 
  group_by(treatment, temperature) %>% 
  mutate(stand_femur = scale(femur), stand_femur, 
         stand_tibia = scale(tibia), stand_tibia, 
         stand_wing = scale(wing), stand_wing, 
         stand_thorax = scale(thorax), stand_thorax,
         relfit = survival/mean(survival), relfit)

Visualizing the raw data. First we are looking at two plots that show survival is influenced by the presence of a predator, and secondly that the flies were successfully randomized to predator and non-predator groups (so the distribution of trait sizes should be the same).

##The presence of a predator influences the survivorship of flies
ggplot(pred_data, aes(survival)) + geom_histogram() + facet_wrap(~treatment)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

##Trait sizes are equally randomized among predator and non-predator groups:

tib <- ggplot(pred_data, aes(tibia, colour = treatment)) + 
  geom_density() +
  facet_wrap(~temperature)

fem <- ggplot(pred_data, aes(femur, colour = treatment)) + 
  geom_density() +
  facet_wrap(~temperature)

wing <- ggplot(pred_data, aes(wing, colour = treatment)) + 
  geom_density() +
  facet_wrap(~temperature)

thor <- ggplot(pred_data, aes(thorax, colour = treatment)) + 
  geom_density() +
  facet_wrap(~temperature)
grid.arrange(tib, fem, wing, thor)

Since the flies were randomly distributed among the predator and non predator groups, we are just going to do out analysis with the predator populations. We are just using the predator cages because we are interested in looking at survival and there was next to no death in the non-predator cages.

Here, we are just subsetting out dataset to make three new dataframes that partion the data into the three temperature and predator groups.

##dataset with just the 18 degree, predator group
data_18_P <- filter(pred_data, temperature == "18", treatment == "P") %>% na.omit()
##dataset with just the 21 degree, predator group
data_21_P <- filter(pred_data, temperature == "21", treatment == "P") %>% na.omit()
##dataset with just the 24 degree, predator group
data_24_P <- filter(pred_data, temperature == "24", treatment == "P") %>% na.omit()

Next we are looking at the raw data plotted against the response variable (survival) to get a visual sense of how each trait may relate to survivability.

##Plots looking at the raw data against survival 

##for thorax
thor_surv <- ggplot(pred_data, aes(thorax, survival, colour = sex)) + geom_point() + geom_smooth()

##for wing
wing_surv <- ggplot(pred_data, aes(wing, survival, colour = sex)) + geom_point() + 
  geom_smooth()

##for femur
fem_surv <- ggplot(pred_data, aes(femur, survival, colour = sex)) + geom_point() + 
  geom_smooth()

##for tibia
tib_surv <- ggplot(pred_data, aes(tibia, survival, colour = sex)) + geom_point() + 
  geom_smooth()
grid.arrange(thor_surv, wing_surv, fem_surv, tib_surv)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

One thing we might also expect is that as certain traits get larger, others will as well (allometry). We can look at how correlated the responses are using the corr function.

##names(pred_data)
##creating a dataset that is just the 18 degree group, and all of the values for unscaled traits:
traits_18 <- pred_data[pred_data$temperature == "18",8:11]
traits_21 <- pred_data[pred_data$temperature == "21",8:11]
traits_24 <- pred_data[pred_data$temperature == "24",8:11]
cor(traits_18)
##            femur    thorax     tibia      wing
## femur  1.0000000 0.8281832 0.8285183 0.8394333
## thorax 0.8281832 1.0000000 0.7601161 0.9053355
## tibia  0.8285183 0.7601161 1.0000000 0.7822475
## wing   0.8394333 0.9053355 0.7822475 1.0000000
cor(traits_21)
##            femur    thorax     tibia      wing
## femur  1.0000000 0.8194699 0.8339236 0.8158630
## thorax 0.8194699 1.0000000 0.7612797 0.9178618
## tibia  0.8339236 0.7612797 1.0000000 0.7638408
## wing   0.8158630 0.9178618 0.7638408 1.0000000
cor(traits_24)
##            femur    thorax     tibia      wing
## femur  1.0000000 0.8731775 0.9036628 0.8732737
## thorax 0.8731775 1.0000000 0.8361377 0.9388418
## tibia  0.9036628 0.8361377 1.0000000 0.8456420
## wing   0.8732737 0.9388418 0.8456420 1.0000000

We can see that the traits are highly correlated with each other among all three temperatures. This may make it difficult to run a linear model to understand which traits affect survival independently. However, we can get a better idea of how correlated each trait is with survival using multiple linear regression models. If we run a multiple linear regression model with all traits, the multiple regression takes into account the covariance between traits. From there, we can plot the residuals of each trait against the reponse variable (survivorship) to get a better idea of how that trait relates to survivorship.

Lets start by looking at two multiple regression models: the full model with all standardized traits, and the model with just one predictor (we arbitrarily pick tibia).

##multiple regression model for the 18C group
lm_full_18 <- lm(survival ~ stand_femur:sex + stand_tibia:sex + stand_wing:sex + stand_thorax:sex + cage, data = data_18_P)
lm_18fit <- lm(relfit ~ stand_femur:sex + stand_tibia:sex + stand_wing:sex + stand_thorax:sex + cage, data = data_18_P)
##multiple regression model for the 21C group
lm_full_21 <- lm(survival ~ stand_femur:sex + stand_tibia:sex + stand_wing:sex + stand_thorax:sex + cage, data = data_21_P)
##multiple regression model for the 24C group
lm_full_24 <- lm(survival ~ stand_femur:sex + stand_tibia:sex + stand_wing:sex + stand_thorax:sex + cage, data = data_24_P)

We tried running this model with cage as a random effect, but due to the fact that there are only 3-4 different cages per temperature group, this was too small a size to include as a random effect. We included cage as a fixed effect instead.

lm_tibia_18 <- lm(survival ~ stand_tibia:sex, data = data_18_P)
##And then their summaries
summary(lm_full_18)
## 
## Call:
## lm(formula = survival ~ stand_femur:sex + stand_tibia:sex + stand_wing:sex + 
##     stand_thorax:sex + cage, data = data_18_P)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.7657  0.3688  0.7219  0.8974  1.4396 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        5.19242    0.19324  26.870   <2e-16 ***
## cage2              0.11294    0.19595   0.576   0.5646    
## cage3              0.06373    0.19007   0.335   0.7375    
## stand_femur:sexF   0.10041    0.23393   0.429   0.6679    
## stand_femur:sexM   0.45984    0.22759   2.020   0.0439 *  
## sexF:stand_tibia  -0.22797    0.19730  -1.155   0.2484    
## sexM:stand_tibia  -0.15501    0.19714  -0.786   0.4320    
## sexF:stand_wing    0.07741    0.28706   0.270   0.7875    
## sexM:stand_wing   -0.11075    0.30505  -0.363   0.7167    
## sexF:stand_thorax -0.17622    0.24573  -0.717   0.4736    
## sexM:stand_thorax -0.14130    0.26035  -0.543   0.5875    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.661 on 515 degrees of freedom
## Multiple R-squared:  0.01856,    Adjusted R-squared:  -0.0005001 
## F-statistic: 0.9738 on 10 and 515 DF,  p-value: 0.4653
summary(lm_tibia_18)$coef
##                     Estimate Std. Error    t value      Pr(>|t|)
## (Intercept)       5.26433708 0.08908943 59.0904777 1.276467e-233
## stand_tibia:sexF -0.23524732 0.11208788 -2.0987758  3.631556e-02
## stand_tibia:sexM  0.03186422 0.11768156  0.2707664  7.866775e-01

We can see that the estimates for the tibia in each model are different, and the standard errors are somewhat different as well, which effects t values, etc. This means that tibia length is affected by the other predictor variables, which also confirms what the corr test told us.

The car package has a function that will compute the residual plots for all the variables in the model automatically:

avPlots(lm_full_18)

avPlots(lm_full_21)

avPlots(lm_full_24)

avplots_18 <- allEffects(mod=lm_full_18)
avplots_21 <- allEffects(mod=lm_full_21)
avplots_24 <- allEffects(mod=lm_full_24)

plot(avplots_18)

plot(avplots_21)

plot(avplots_24)

Acknowlegding that our traits are highly correlated, we are going to assess the full multiple regression anyway (but we are going to use Principle Component Analysis later to account for this attribute). In this case, the intercept will be the mean number of days survived (of just the predator group). Since we have scaled our data, each estimate for the traits will tell us how much of an increase or decrease in the number of days you’ll survive for one standard deviation away from the mean (i.e a -0.7mm increase in tibia length will increase survivorship by 0.5 days).

Multiple linear regression for 18C

##Summaries with survival and relative fitness as the reponse variable 
summary(lm_full_18)
## 
## Call:
## lm(formula = survival ~ stand_femur:sex + stand_tibia:sex + stand_wing:sex + 
##     stand_thorax:sex + cage, data = data_18_P)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.7657  0.3688  0.7219  0.8974  1.4396 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        5.19242    0.19324  26.870   <2e-16 ***
## cage2              0.11294    0.19595   0.576   0.5646    
## cage3              0.06373    0.19007   0.335   0.7375    
## stand_femur:sexF   0.10041    0.23393   0.429   0.6679    
## stand_femur:sexM   0.45984    0.22759   2.020   0.0439 *  
## sexF:stand_tibia  -0.22797    0.19730  -1.155   0.2484    
## sexM:stand_tibia  -0.15501    0.19714  -0.786   0.4320    
## sexF:stand_wing    0.07741    0.28706   0.270   0.7875    
## sexM:stand_wing   -0.11075    0.30505  -0.363   0.7167    
## sexF:stand_thorax -0.17622    0.24573  -0.717   0.4736    
## sexM:stand_thorax -0.14130    0.26035  -0.543   0.5875    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.661 on 515 degrees of freedom
## Multiple R-squared:  0.01856,    Adjusted R-squared:  -0.0005001 
## F-statistic: 0.9738 on 10 and 515 DF,  p-value: 0.4653
summary(lm_18fit)
## 
## Call:
## lm(formula = relfit ~ stand_femur:sex + stand_tibia:sex + stand_wing:sex + 
##     stand_thorax:sex + cage, data = data_18_P)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.91891  0.07112  0.13920  0.17303  0.27758 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        1.00118    0.03726  26.870   <2e-16 ***
## cage2              0.02178    0.03778   0.576   0.5646    
## cage3              0.01229    0.03665   0.335   0.7375    
## stand_femur:sexF   0.01936    0.04511   0.429   0.6679    
## stand_femur:sexM   0.08866    0.04388   2.020   0.0439 *  
## sexF:stand_tibia  -0.04396    0.03804  -1.155   0.2484    
## sexM:stand_tibia  -0.02989    0.03801  -0.786   0.4320    
## sexF:stand_wing    0.01493    0.05535   0.270   0.7875    
## sexM:stand_wing   -0.02135    0.05882  -0.363   0.7167    
## sexF:stand_thorax -0.03398    0.04738  -0.717   0.4736    
## sexM:stand_thorax -0.02725    0.05020  -0.543   0.5875    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3203 on 515 degrees of freedom
## Multiple R-squared:  0.01856,    Adjusted R-squared:  -0.0005001 
## F-statistic: 0.9738 on 10 and 515 DF,  p-value: 0.4653
summary(lm_full_21)
## 
## Call:
## lm(formula = survival ~ stand_femur:sex + stand_tibia:sex + stand_wing:sex + 
##     stand_thorax:sex + cage, data = data_21_P)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.7213 -1.1335  0.7687  1.2738  2.9962 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        4.250373   0.270877  15.691  < 2e-16 ***
## cage2              0.406568   0.232597   1.748 0.081126 .  
## cage3              0.607803   0.229085   2.653 0.008244 ** 
## stand_femur:sexF   0.638493   0.268285   2.380 0.017715 *  
## stand_femur:sexM   0.427958   0.262891   1.628 0.104218    
## sexF:stand_tibia   0.370260   0.251771   1.471 0.142062    
## sexM:stand_tibia  -0.302258   0.226300  -1.336 0.182309    
## sexF:stand_wing    0.022464   0.378631   0.059 0.952715    
## sexM:stand_wing   -0.638395   0.383567  -1.664 0.096707 .  
## sexF:stand_thorax -1.012885   0.302913  -3.344 0.000892 ***
## sexM:stand_thorax  0.009081   0.335044   0.027 0.978389    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.92 on 470 degrees of freedom
## Multiple R-squared:  0.09778,    Adjusted R-squared:  0.07858 
## F-statistic: 5.094 on 10 and 470 DF,  p-value: 4.494e-07

Multiple linear regression for 21 group

summary(lm_full_24)
## 
## Call:
## lm(formula = survival ~ stand_femur:sex + stand_tibia:sex + stand_wing:sex + 
##     stand_thorax:sex + cage, data = data_24_P)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.9651  0.2344  0.6179  0.9083  1.8905 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        5.15295    0.19921  25.867  < 2e-16 ***
## cage2             -0.01302    0.19984  -0.065  0.94807    
## cage3             -0.25462    0.20736  -1.228  0.21997    
## cage4              0.13960    0.19895   0.702  0.48316    
## stand_femur:sexF   0.09904    0.26663   0.371  0.71044    
## stand_femur:sexM  -0.18489    0.23929  -0.773  0.44003    
## sexF:stand_tibia   0.50618    0.23343   2.168  0.03053 *  
## sexM:stand_tibia   0.60458    0.23380   2.586  0.00995 ** 
## sexF:stand_wing    0.29577    0.32525   0.909  0.36353    
## sexM:stand_wing   -0.28911    0.36125  -0.800  0.42386    
## sexF:stand_thorax -0.95182    0.31328  -3.038  0.00249 ** 
## sexM:stand_thorax -0.28877    0.30227  -0.955  0.33980    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.639 on 587 degrees of freedom
## Multiple R-squared:  0.05212,    Adjusted R-squared:  0.03436 
## F-statistic: 2.934 on 11 and 587 DF,  p-value: 0.000864

Multiple linear regression for 24 group

From these models we can see that there are differences in the estimates for our variables between temperature groups which may suggest rearing temperatures or overall fly size have different fitness consequences.

Between models we not only see differences between the effects of phenotypic traits but also cage effects.

Next we are going to do some diagnostic checks for normality to see if our models are a good fit.

par(mfrow=c(2,2),mar=c(2,3,1.5,1),mgp=c(2,1,0))

plot(lm_full_18, id.n = 4)

plot(lm_full_21, id.n = 4)

plot(lm_full_24, id.n = 4)

From our diagnostic plots, we can see that our data violates the assumption of normality. And although a multiple linear regression model is what is typically used for selection experiments, we know that the traits are highly correlated with one another. For this reason, we are going to run a permutation test to determine how extreme our observed data is from what we might expect by chance.

##permutations test for 18C
set.seed(500)
numsims <- 1000
res_18 <- matrix(nrow = numsims, ncol = 11)
colnames(res_18) <- c("Intercept", "Cage 2", "Cage 3", "Stand_Femur:Sex_F", "Stand_Femur:Sex_M", "Stand_Tibia:Sex_F", "Stand_Tibia:Sex_M", "Stand_Wing:Sex_F", "Stand_Wing:Sex_M", "Stand_Thorax:Sex_F", "Stand_Thorax:Sex_M")
for (i in 1:numsims) {
  perm <- sample(nrow(data_18_P))
  pdat_18 <- transform(data_18_P, survival = survival[perm])
  res_18[i, 1:11] <- coef(lm(survival[perm] ~ stand_femur:sex + stand_tibia:sex + stand_wing:sex + stand_thorax:sex + cage, data = pdat_18))
}
colMeans(res_18)
##          Intercept             Cage 2             Cage 3 
##       5.187011e+00      -2.303924e-03      -2.348340e-03 
##  Stand_Femur:Sex_F  Stand_Femur:Sex_M  Stand_Tibia:Sex_F 
##       1.347864e-03       4.191995e-03       1.223393e-03 
##  Stand_Tibia:Sex_M   Stand_Wing:Sex_F   Stand_Wing:Sex_M 
##       5.141339e-03      -4.328124e-05      -1.414505e-02 
## Stand_Thorax:Sex_F Stand_Thorax:Sex_M 
##      -3.855398e-03       3.545259e-03
observed_18 <- coef(lm_full_18)

#permutation test for 21C
res_21 <- matrix(nrow = numsims, ncol = 11)
colnames(res_21) <- c("Intercept", "Cage 2", "Cage 3", "Stand_Femur:Sex_F", "Stand_Femur:Sex_M", "Stand_Tibia:Sex_F", "Stand_Tibia:Sex_M", "Stand_Wing:Sex_F", "Stand_Wing:Sex_M", "Stand_Thorax:Sex_F", "Stand_Thorax:Sex_M")
for (i in 1:numsims) {
  perm <- sample(nrow(data_21_P))
  pdat_21 <- transform(data_21_P, survival = survival[perm])
  res_21[i, 1:11] <- coef(lm(survival[perm] ~ stand_femur:sex + stand_tibia:sex + stand_wing:sex + stand_thorax:sex + cage, data = pdat_21))
}
colMeans(res_21)
##          Intercept             Cage 2             Cage 3 
##        4.730477059        0.011770939        0.006143139 
##  Stand_Femur:Sex_F  Stand_Femur:Sex_M  Stand_Tibia:Sex_F 
##       -0.004608265       -0.006380527        0.001183350 
##  Stand_Tibia:Sex_M   Stand_Wing:Sex_F   Stand_Wing:Sex_M 
##       -0.011978881        0.014455753       -0.007298784 
## Stand_Thorax:Sex_F Stand_Thorax:Sex_M 
##       -0.007122709        0.023783460
observed_21 <- coef(lm_full_21)

#permutation test for 24C
res_24 <- matrix(nrow = numsims, ncol = 12)
colnames(res_24) <- c("Intercept", "Cage 2", "Cage 3","Cage 4", "Stand_Femur:Sex_F", "Stand_Femur:Sex_M", "Stand_Tibia:Sex_F", "Stand_Tibia:Sex_M", "Stand_Wing:Sex_F", "Stand_Wing:Sex_M", "Stand_Thorax:Sex_F", "Stand_Thorax:Sex_M")
for (i in 1:numsims) {
  perm <- sample(nrow(data_24_P))
  pdat_24 <- transform(data_24_P, survival = survival[perm])
  res_24[i, 1:12] <- coef(lm(survival[perm] ~ stand_femur:sex + stand_tibia:sex + stand_wing:sex + stand_thorax:sex + cage, data = pdat_24))
}
colMeans(res_24)
##          Intercept             Cage 2             Cage 3 
##       5.1661472001      -0.0053509363      -0.0020232679 
##             Cage 4  Stand_Femur:Sex_F  Stand_Femur:Sex_M 
##       0.0002968865      -0.0036519454      -0.0046614671 
##  Stand_Tibia:Sex_F  Stand_Tibia:Sex_M   Stand_Wing:Sex_F 
##       0.0056754002      -0.0019652896      -0.0071404677 
##   Stand_Wing:Sex_M Stand_Thorax:Sex_F Stand_Thorax:Sex_M 
##      -0.0026595536       0.0045536971       0.0075808869
observed_24 <- coef(lm_full_24)

perm_hists <- function(x, y) {
  par(mfrow=c(3,4),mar=c(2,3,1.5,1),mgp=c(2,1,0))
  for (i in 1:ncol(x)) {
    hist(x[,i], col = "gray", las = 1, main = names(x[1,i]))
    abline(v= y[i], col = "red")
  }
}

#results from permutation tests
perm_hists(res_18, observed_18)
perm_hists(res_21, observed_21)

perm_hists(res_24, observed_24)

PCA Analysis: Since the beginning of our analysis showed that are variables in our model were highly correlated, we decided to use PCA analysis in order to determine if non-size related variation seems to has a relationship with survival.

#pca for 18C
pca_18 <- prcomp(data_18_P[,12:15], scale. = TRUE)
summary(pca_18)
## Importance of components:
##                           PC1     PC2     PC3     PC4
## Standard deviation     1.8604 0.53687 0.39180 0.31156
## Proportion of Variance 0.8653 0.07206 0.03838 0.02427
## Cumulative Proportion  0.8653 0.93736 0.97573 1.00000
pca_18$rotation
##                     PC1        PC2        PC3         PC4
## stand_femur  -0.5041518  0.2113741 -0.8348522 -0.06460384
## stand_tibia  -0.4841482  0.7279445  0.4814870 -0.06218919
## stand_wing   -0.5098634 -0.3761831  0.1539827  0.75816547
## stand_thorax -0.5014686 -0.5328264  0.2179021 -0.64586684
data18_full <- data.frame(data_18_P, 
                        pca_18$x)

#pca for 21C
pca_21 <- prcomp(data_21_P[,12:15], scale. = TRUE)
summary(pca_21)
## Importance of components:
##                           PC1     PC2     PC3     PC4
## Standard deviation     1.8491 0.56882 0.40873 0.30024
## Proportion of Variance 0.8548 0.08089 0.04177 0.02254
## Cumulative Proportion  0.8548 0.93570 0.97746 1.00000
pca_21$rotation
##                     PC1        PC2         PC3         PC4
## stand_femur  -0.4988588  0.3884303 -0.77467571  0.01180286
## stand_tibia  -0.4879542  0.6122052  0.62159749  0.02687011
## stand_wing   -0.5082205 -0.4601767  0.08552093 -0.72293534
## stand_thorax -0.5047301 -0.5124101  0.07861469  0.69029210
data21_full <- data.frame(data_21_P, 
                        pca_21$x)

#pca for 24C
pca_24 <- prcomp(data_24_P[,12:15], scale. = TRUE)
summary(pca_24)
## Importance of components:
##                           PC1     PC2     PC3     PC4
## Standard deviation     1.9051 0.45536 0.32467 0.24054
## Proportion of Variance 0.9073 0.05184 0.02635 0.01446
## Cumulative Proportion  0.9073 0.95918 0.98554 1.00000
pca_24$rotation
##                     PC1        PC2         PC3         PC4
## stand_femur  -0.5002606 -0.3548390  0.78775372 -0.05720761
## stand_tibia  -0.4922463 -0.6335773 -0.59314476  0.06672847
## stand_wing   -0.5042676  0.4678598 -0.16088788 -0.70776869
## stand_thorax -0.5031373  0.5037622 -0.04169489  0.70095513
data24_full <- data.frame(data_24_P, 
                        pca_24$x)

##PCA models
lm18_pca <- lm(survival ~ (PC1 + PC2 + sex)^2 - PC1:PC2 + cage, data = data18_full)
summary(lm18_pca)
## 
## Call:
## lm(formula = survival ~ (PC1 + PC2 + sex)^2 - PC1:PC2 + cage, 
##     data = data18_full)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5642  0.5171  0.7375  0.8624  1.2953 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.16754    0.20897  24.729   <2e-16 ***
## PC1          0.10371    0.08930   1.161    0.246    
## PC2         -0.08859    0.22582  -0.392    0.695    
## sexM         0.20672    0.28609   0.723    0.470    
## cage2        0.12766    0.19518   0.654    0.513    
## cage3        0.06910    0.18962   0.364    0.716    
## PC1:sexM    -0.21838    0.14125  -1.546    0.123    
## PC2:sexM     0.06003    0.33026   0.182    0.856    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.663 on 518 degrees of freedom
## Multiple R-squared:  0.01076,    Adjusted R-squared:  -0.002611 
## F-statistic: 0.8047 on 7 and 518 DF,  p-value: 0.5836
par(mfrow=c(2,2),mar=c(2,3,1.5,1),mgp=c(2,1,0))
plot(lm18_pca, id.n = 4)

lm21_pca <- lm(survival ~ (PC1 + PC2 + sex)^2 - PC1:PC2 + cage, data = data21_full)
summary(lm21_pca)
## 
## Call:
## lm(formula = survival ~ (PC1 + PC2 + sex)^2 - PC1:PC2 + cage, 
##     data = data21_full)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.7394 -1.1337  0.8516  1.2768  3.3124 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.6844     0.3143  14.903  < 2e-16 ***
## PC1           0.1639     0.1233   1.329   0.1844    
## PC2           1.3518     0.2892   4.674 3.86e-06 ***
## sexM         -0.4900     0.3725  -1.315   0.1890    
## cage2         0.3486     0.2321   1.501   0.1339    
## cage3         0.6258     0.2281   2.744   0.0063 ** 
## PC1:sexM      0.1161     0.1798   0.645   0.5190    
## PC2:sexM     -1.0313     0.4338  -2.377   0.0178 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.928 on 473 degrees of freedom
## Multiple R-squared:  0.0843, Adjusted R-squared:  0.07075 
## F-statistic: 6.221 on 7 and 473 DF,  p-value: 5.495e-07
par(mfrow=c(2,2),mar=c(2,3,1.5,1),mgp=c(2,1,0))
plot(lm21_pca, id.n = 4)

lm24_pca <- lm(survival ~ (PC1 + PC2 + sex)^2 - PC1:PC2 + cage, data = data24_full)
summary(lm24_pca)
## 
## Call:
## lm(formula = survival ~ (PC1 + PC2 + sex)^2 - PC1:PC2 + cage, 
##     data = data24_full)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.8877  0.3333  0.6215  0.9252  1.8210 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.127913   0.223418  22.952   <2e-16 ***
## PC1          0.033583   0.079174   0.424    0.672    
## PC2         -0.636097   0.297594  -2.137    0.033 *  
## sexM         0.136486   0.277191   0.492    0.623    
## cage2        0.048334   0.199072   0.243    0.808    
## cage3       -0.254208   0.207829  -1.223    0.222    
## cage4        0.125384   0.198566   0.631    0.528    
## PC1:sexM     0.003469   0.126147   0.027    0.978    
## PC2:sexM     0.200376   0.431201   0.465    0.642    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.645 on 590 degrees of freedom
## Multiple R-squared:  0.04022,    Adjusted R-squared:  0.0272 
## F-statistic:  3.09 on 8 and 590 DF,  p-value: 0.002008
par(mfrow=c(2,2),mar=c(2,3,1.5,1),mgp=c(2,1,0))
plot(lm24_pca, id.n = 4)

In each of these models, PC1 captures mostly size-related variation. This is evident because we know that our traits are highly corrolated, and the outputs for each trait estimate in PC1 are very similar. We can also show this by plotting PC1 against each of the trait values (example for one temperature below). As such, PC2 we suspect captures mostly non-size related variation. An interesting attribute of the model outputs for PC2 (among all temperature groups) is the direction of variation for thorax and wing are the same, and the direction of variation for tibia and femur are also the same, but in opposition to thorax and wing.

(ggplot(data = data18_full, aes(stand_tibia, PC1, colour = sex)) 
  + geom_point())

(ggplot(data = data18_full, aes(stand_femur, PC1, colour = sex)) 
  + geom_point())

(ggplot(data = data18_full, aes(stand_thorax, PC1, colour = sex)) 
  + geom_point())

(ggplot(data = data18_full, aes(stand_wing, PC1, colour = sex)) 
  + geom_point())

##permutations for PCA analysis 
set.seed(500)
numsims <- 1000
res18_pca <- matrix(nrow = numsims, ncol = 8) 
colnames(res18_pca) <- c("Intercept", "PC1", "PC2", "Sex_M", "Cage_2", "Cage_3", "PC1:Sex_M", "PC2:Sex_M")
for (i in 1:numsims) {
  perm <- sample(nrow(data18_full))
  pdat <- transform(data18_full, survival = survival[perm])
  res18_pca[i, 1:8] <- coef(lm(survival[perm] ~ (PC1 + PC2 + sex)^2 - PC1:PC2 + cage, data = pdat))
}

colMeans(res18_pca)
##     Intercept           PC1           PC2         Sex_M        Cage_2 
##  5.1913677624  0.0024687186  0.0060083565 -0.0077486724 -0.0014401452 
##        Cage_3     PC1:Sex_M     PC2:Sex_M 
## -0.0031340654 -0.0001021949  0.0043576437
observed18_pca <- coef(lm18_pca)

res21_pca <- matrix(nrow = numsims, ncol = 8) 
colnames(res21_pca) <- c("Intercept", "PC1", "PC2", "Sex_M", "Cage_2", "Cage_3", "PC1:Sex_M", "PC2:Sex_M")
for (i in 1:numsims) {
  perm <- sample(nrow(data21_full))
  pdat <- transform(data21_full, survival = survival[perm])
  res21_pca[i, 1:8] <- coef(lm(survival[perm] ~ (PC1 + PC2 + sex)^2 - PC1:PC2 + cage, data = pdat))
}

colMeans(res21_pca)
##     Intercept           PC1           PC2         Sex_M        Cage_2 
##  4.7343089697 -0.0009321866 -0.0014797165  0.0032860730  0.0106222163 
##        Cage_3     PC1:Sex_M     PC2:Sex_M 
##  0.0050321318 -0.0011463898 -0.0217135352
observed21_pca <- coef(lm21_pca)

res24_pca <- matrix(nrow = numsims, ncol = 9) 
colnames(res24_pca) <- c("Intercept", "PC1", "PC2", "Sex_M", "Cage_2", "Cage_3", "Cage_4", "PC1:Sex_M", "PC2:Sex_M")
for (i in 1:numsims) {
  perm <- sample(nrow(data24_full))
  pdat <- transform(data24_full, survival = survival[perm])
  res24_pca[i, 1:9] <- coef(lm(survival[perm] ~ (PC1 + PC2 + sex)^2 - PC1:PC2 + cage, data = pdat))
}

colMeans(res24_pca)
##     Intercept           PC1           PC2         Sex_M        Cage_2 
##  5.1720144567  0.0026880640 -0.0120670958 -0.0151517621 -0.0037531980 
##        Cage_3        Cage_4     PC1:Sex_M     PC2:Sex_M 
## -0.0006045741  0.0017648975  0.0014792062  0.0086024274
observed24_pca <- coef(lm24_pca)

perm_hists(res18_pca, observed18_pca)
perm_hists(res21_pca, observed21_pca)

perm_hists(res24_pca, observed24_pca)

The output of our permutations using the PCA models corroborate what we saw with the output of our multiple regression models (i.e. more extreme values in the permuation tests had lower P-values).

Overall Conclusions

With this project, our goal was to answer these three questions: 1: Do specific traits have an influence on the length of time flies survived? 2: What is the direction of selection for these traits? 3: Does the degree of selection for each trait differ among rearing temperatures?

#Among our multiple regression models for each temperature group, there are differences in the strength of selection for each trait as well as their direction. Moreover, there are sex differences in the strength of selection for each trait. In addition, the effect of cage was larger than we anticipated given that these were meant to represent the replicates within the experiment. We know that there is a high degree of correlation among traits for all temperature groups, and our diagnostic plots demonstrate that these models are not good fits. For this reason, we would take any biological interpretations with caution. However, our permutation tests demonstrated that the "significant" results we found may not have been due to chance. Furthermore, any effect that we see for any trait is masked by size effect variation. 

#Our PCA correlation plots show that PC1 captures mostly size variation, leaving mostly non-size related variation in PC2 (and 3 and 4...). However, one of the limitations using PC analysis is that we are not able to capture the effects of each trait.